Goto

Collaborating Authors

 St. Petersburg


These rare, giant millipedes only exist in Florida

Popular Science

When a graduate found a baby Florida scrub millipede, she put it in a kiddie pool. Then it got busy reproducing. Breakthroughs, discoveries, and DIY tips sent six days a week. While Florida is perhaps best known for its beaches and wetlands, its landscape hosts other notable features: ridges . Millions of years ago, sea levels were higher than they are today, and these elevated areas of land became like islands.


Space Explanations of Neural Network Classification

Labbaf, Faezeh, Kolárik, Tomáš, Blicha, Martin, Fedyukovich, Grigory, Wand, Michael, Sharygina, Natasha

arXiv.org Artificial Intelligence

Explainability of decision-making AI systems (XAI), and specifically neural networks (NNs), is a key requirement for deploying AI in sensitive areas [18]. A recent trend in explaining NNs is based on formal methods and logic, providing explanations for the decisions of machine learning systems [24, 31, 32, 41, 42, 44] accompanied by provable guarantees regarding their correctness. Yet, rigorous exploration of the continuous feature space requires to estimate decision boundaries with complex shapes. This, however, remains a challenge because existing explanations [24, 31, 32, 41, 42, 44] constrain only individual features and hence fail capturing relationships among the features that are essential to understand the reasons behind the multi-parametrized classification process. We address the need to provide interpretations of NN systems that are as meaningful as possible using a novel concept of Space Explanations, delivered by a flexible symbolic reasoning framework where Craig interpolation [12] is at the heart of the machinery.


Towards General Loop Invariant Generation: A Benchmark of Programs with Memory Manipulation Anonymous Author(s) Affiliation Address email 1 Overview of Supplementary Material

Neural Information Processing Systems

Dataset Documentation: We have documented our dataset for intended researchers as required. The link to download the models after fine-tuning is https://mega.nz/file/M9FEWCjD# To fill the lack of benchmarks for general loop invariant generation, we propose LIG-MM, a loop invariant generation benchmark of memory manipulation programs. Table 1 below shows the basics of the code in LIG-MM. Multiple examples are shown in Sec. 3, and the Table 1: Statistics of our proposed LIG-MM benchmark.



Cost-Driven Synthesis of Sound Abstract Interpreters

Gu, Qiuhan, Singh, Avaljot, Singh, Gagandeep

arXiv.org Artificial Intelligence

Constructing abstract interpreters that provide global soundness guarantees remains a major obstacle in abstract interpretation. We investigate whether modern LLMs can reduce this burden by leveraging them to synthesize sound, non-trivial abstract interpreters across multiple abstract domains in the setting of neural network verification. We formulate synthesis as a constrained optimization problem and introduce a novel mathematically grounded cost function for measuring unsoundness under strict syntactic and semantic constraints. Based on this formulation, we develop a unified framework that unifies LLM-based generation with syntactic and semantic validation and a quantitative cost-guided feedback mechanism. Empirical results demonstrate that our framework not only matches the quality of handcrafted transformers, but more importantly, discovers sound, high-precision transformers for complex nonlinear operators that are absent from existing literature.




'People thought I was a communist doing this as a non-profit': is Wikipedia's Jimmy Wales the last decent tech baron?

The Guardian

'People thought I was a communist doing this as a non-profit': is Wikipedia's Jimmy Wales the last decent tech baron? In an online landscape characterised by doom and division, the people's encyclopedia stands out - a huge collective endeavour giving everyone free access to the sum of human knowledge. But with Elon Musk branding it'Wokipedia' and AI looming large, can it survive? W ikipedia will be 25 years old in January. Jimmy Wales's daughter will be 25 and three weeks. It's not a coincidence: on Boxing Day 2000 Wales's then wife, Christine, gave birth to a baby girl, but it quickly became clear that something wasn't right. She had breathed in contaminated amniotic fluid, resulting in a life-threatening condition called meconium aspiration syndrome. An experimental treatment was available at the hospital near where they lived in San Diego. Did they want to try it?


Discrepancy Detection at the Data Level: Toward Consistent Multilingual Question Answering

Calvo-Bartolomé, Lorena, Aldana, Valérie, Cantarero, Karla, de Mesa, Alonso Madroñal, Arenas-García, Jerónimo, Boyd-Graber, Jordan

arXiv.org Artificial Intelligence

Multilingual question answering (QA) systems must ensure factual consistency across languages, especially for objective queries such as What is jaundice?, while also accounting for cultural variation in subjective responses. We propose MIND, a user-in-the-loop fact-checking pipeline to detect factual and cultural discrepancies in multilingual QA knowledge bases. MIND highlights divergent answers to culturally sensitive questions (e.g., Who assists in childbirth?) that vary by region and context. We evaluate MIND on a bilingual QA system in the maternal and infant health domain and release a dataset of bilingual questions annotated for factual and cultural inconsistencies. We further test MIND on datasets from other domains to assess generalization. In all cases, MIND reliably identifies inconsistencies, supporting the development of more culturally aware and factually consistent QA systems.


Beyond Postconditions: Can Large Language Models infer Formal Contracts for Automatic Software Verification?

Richter, Cedric, Wehrheim, Heike

arXiv.org Artificial Intelligence

Automatic software verifiers have become increasingly effective at the task of checking software against (formal) specifications. Yet, their adoption in practice has been hampered by the lack of such specifications in real world code. Large Language Models (LLMs) have shown promise in inferring formal postconditions from natural language hints embedded in code such as function names, comments or documentation. Using the generated postconditions as specifications in a subsequent verification, however, often leads verifiers to suggest invalid inputs, hinting at potential issues that ultimately turn out to be false alarms. To address this, we revisit the problem of specification inference from natural language in the context of automatic software verification. In the process, we introduce NL2Contract, the task of employing LLMs to translate informal natural language into formal functional contracts, consisting of postconditions as well as preconditions. We introduce metrics to validate and compare different NL2Contract approaches, using soundness, bug discriminative power of the generated contracts and their usability in the context of automatic software verification as key metrics. We evaluate NL2Contract with different LLMs and compare it to the task of postcondition generation nl2postcond. Our evaluation shows that (1) LLMs are generally effective at generating functional contracts sound for all possible inputs, (2) the generated contracts are sufficiently expressive for discriminating buggy from correct behavior, and (3) verifiers supplied with LLM inferred functional contracts produce fewer false alarms than when provided with postconditions alone. Further investigations show that LLM inferred preconditions generally align well with developers intentions which allows us to use automatic software verifiers to catch real-world bugs.